The possibility of an indefinite AI pause
tl;dr An indefinite AI pause is a somewhat plausible outcome and could be made more likely if EAs actively push for a generic pause. I think an indefinite pause proposal is substantially worse than a brief pause proposal, and would probably be net negative. I recommend that alternative policies with greater effectiveness and fewer downsides should be considered instead.
Broadly speaking, there seem to be two types of moratoriums on technologies: (1) moratoriums that are quickly lifted, and (2) moratoriums that are later codified into law as indefinite bans.
In the first category, we find the voluntary 1974 moratorium on recombinant DNA research, the 2014 moratorium on gain of function research, and the FDA’s partial 2013 moratorium on genetic screening.
In the second category, we find the 1958 moratorium on conducting nuclear tests above the ground (later codified in the 1963 Partial Nuclear Test Ban Treaty), and the various moratoriums worldwide on human cloning and germline editing of human genomes. In these cases, it is unclear whether the bans will ever be lifted – unless at some point it becomes infeasible to enforce them.
Overall I’m quite uncertain about the costs and benefits of a brief AI pause. The foreseeable costs of a brief pause, such as the potential for a compute overhang, have been discussed at length by others, and I will not focus my attention on them here. I recommend reading this essay to find a perspective on brief pauses that I’m sympathetic to.
However, I think it’s also important to consider whether, conditional on us getting an AI pause at all, we’re actually going to get a pause that quickly ends. I currently think there is a considerable chance that society will impose an indefinite de facto ban on AI development, and this scenario seems worth analyzing in closer detail.
Note: in this essay, I am only considering the merits of a potential lengthy moratorium on AI, and I freely admit that there are many meaningful axes on which regulatory policy can vary other than “more” or “less”. Many forms of AI regulation may be desirable even if we think a long pause is not a good policy. Nevertheless, it still seems worth discussing the long pause as a concrete proposal of its own.
The possibility of an indefinite pause
Since an “indefinite pause” is vague, let me be more concrete. I currently think there is between a 10% and 50% chance that our society will impose legal restrictions on the development of advanced AI[1] systems that,
Prevent the proliferation of advanced AI for more than 10 years beyond the counterfactual under laissez-faire
Have no fixed, predictable expiration date (without necessarily lasting forever)
Eliezer Yudkowsky, perhaps the most influential person in the AI risk community, has already demanded an “indefinite and worldwide” moratorium on large training runs. This sentiment isn’t exactly new. Some effective altruists, such as Toby Ord, have argued that humanity should engage in a “long reflection” before embarking on ambitious and irreversible technological projects, including AGI. William MacAskill suggested that this pause should perhaps last “a million years”. Two decades ago, Nick Bostrom considered the ethics of delaying new technologies in a utilitarian framework and concluded a delay of “over 10 million years” may be justified if it reduces existential risk by a single percentage point.
I suspect there are approximately three ways that such a pause could come about. The first possibility is that governments could explicitly write such a pause into law, fearing the development of AI in a broad sense, just as people now fear human cloning and germline genetic engineering of humans.
The second possibility is that governments could enforce a pause that is initially intended to be temporary, but which later gets extended. Such a pause may look like the 2011 moratorium on nuclear energy in Germany in response to the Fukushima nuclear accident, which was initially intended to last three months, but later became entrenched as part of a broader agenda to phase out all nuclear power plants in the country.
The third possibility is that governments could impose regulatory restrictions that are so strict that they are functionally equivalent to an indefinite ban. This type of pause may look somewhat like the current situation in the United States with nuclear energy, in which it’s nominally legal to build new nuclear plants, but in practice, nuclear energy capacity has been essentially flat since 1990, in part because of the ability of regulatory agencies to ratchet up restrictions without an obvious limit.
Whatever the causes of an indefinite AI pause, it seems clear to me that we should consider the scenario as a serious possibility. Even if you intend for an AI pause to be temporary, as we can see from the example of Germany in 2011 above, moratoriums can end up being extended. And even if EAs explicitly demand that a pause be temporary, we are not the only relevant actors. The pause may be hijacked by people interested in maintaining a ban for any number of moral or economic reasons, such as fears that people will lose their job from AI, or merely a desire to maintain the status quo.
Indeed, it seems natural to me that a temporary moratorium on AI would be extended by default, as I do not currently see any method by which we could “prove” that a new AI system is safe (in a strict sense). The main risk from AI that many EAs worry about now is the possibility that an AI will hide its intentions, even upon close examination. Thus, absent an incredible breakthrough in AI interpretability, if we require that AI companies “prove” that their systems are safe before they are released, I do not think that this standard will be met in six months, and I am doubtful that it could be met in decades – or perhaps even centuries.
Furthermore, given the compute overhang effect, “unpausing” runs the risk of a scenario in which capabilities jump up almost overnight as actors are suddenly able to take full advantage of their compute resources to train AI. The longer the pause is sustained, the more significant this effect is likely to be. Given that this sudden jump is foreseeable, society may be hesitant to unpause, and may instead repeatedly double down on a pause after a sufficiently long moratorium, or unpause only very slowly. I view this outcome as plausible if we go ahead with an initial temporary AI pause, and it would have a similar effect as an indefinite moratorium.
Anecdotally, the overton window appears to be moving in the direction of an indefinite pause.
Before 2022, it was almost unheard of to hear people in the AI risk community demanding strict AI regulations to be immediately adopted; now, such talk is commonplace. The difference between now and then seems to mostly be that AI capabilities have improved and that the subject has received more popular attention. A reasonable inference to draw is that as AI capabilities improve even more, we will see more calls for AI regulation, and an increase in the intensity of what is being asked for. I do not think it’s obvious what the end result of this process looks like.
Given that AI is poised to be a “wild” technology that will radically reshape the world, many ideas that are considered absurd now may become mainstream as advanced AI draws nearer, especially if effective altruists actively push for them. At some point I think this may include the proposal for indefinitely pausing AI.
Evaluating an indefinite pause
As far as I can tell, the benefits of an indefinite pause are relatively straightforward. In short, we would get more time to do AI safety research, philosophy, field-building, and deliberate on effective AI policy. I think all of these things are valuable, all else being equal. However, these benefits presumably suffer from diminishing returns as the pause grows longer. On the other hand, it seems conceivable that the drawbacks of an indefinite pause would grow with the length of the pause, eventually outweighing the benefits even if you thought the costs were initially small. In addition to sharing the drawbacks of a brief pause, an indefinite pause would likely take on a qualitatively different character for a number of reasons.
Level of coordination needed for an indefinite pause
Perhaps the most salient reason against advocating an indefinite pause is that maintaining one for sufficiently long would likely require a strong regime of global coordination, and probably a world government. Without a strong regime of global coordination, any nation could decide to break the agreement and develop AI on their own, becoming incredibly rich as a result – perhaps even richer than the entire rest of the world combined within a decade.
Also, unless all nations agreed to join voluntarily, creating this world government may require that we go to war with nations that don’t want to join.
I think there are two ways of viewing this objection. Either it is an argument against the feasibility of an indefinite pause, or it is a statement about the magnitude of the negative consequences of trying an indefinite pause. I think either way you view the objection, it should lower your evaluation of advocating for an indefinite pause.
First, let me explain the basic argument.
It is instructive that our current world is unable to prevent the existence of tax havens. It is generally agreed that governments have a strong interest in coordinating to maintain a high minimum effective tax rate on capital to ensure high tax revenue. Yet if any country sets a low effective tax rate, they will experience a capital influx while other nations experience substantial capital flight and the loss of tax revenue. Despite multiple attempts to eliminate them on the international level, tax havens persist because of the individual economic benefits for nations that become tax havens.
Under many models of economic growth, AI promises to deliver unprecedented prosperity due to the ability for AI to substitute for human labor, dramatically expanding the effective labor supply. This benefit seems much greater for nations than even the benefit of becoming a tax haven; as a result, it appears very hard to prevent AI development for a long time under our current “anarchic” international regime. Therefore, in the absence of a world government, eventual frontier AI development seems likely to continue even if individual nations try to pause.
Moreover, a world government overseeing an indefinite AI pause may require an incredibly intrusive surveillance and legal system to maintain the pause. The ultimate reason for this fact is that, assuming trends continue for a while, at some point it will become very cheap to train advanced AI. Assuming the median estimate given by Joseph Carlsmith for the compute usage of the human brain, it should eventually be possible to train human-level AI with only about 10^24 FLOP. Currently, the cost of 10^24 FLOP is surprisingly small: on the order of $1 million.
After an AGI is trained, it could be copied cheaply, and may be small enough to run on consumer devices. The brain provides us empirical proof that human intelligence can be run on a device that consumes only 12 watts and weighs only 1.4 kilograms. And there is little reason to think that the cost of training an AGI could not fall below the cost of the human brain. The human brain, while highly efficient along some axes, was not optimized by evolution to have the lowest possible economic cost of production in our current industrial environment.
Given both hardware progress and algorithmic progress, the cost of training AI is dropping very quickly. The price of computation has historically fallen by half roughly every two to three years since 1945. This means that even if we could increase the cost of production of computer hardware by, say, 1000% through an international ban on the technology, it may only take a decade for continued hardware progress alone to drive costs back to their previous level, allowing actors across the world to train frontier AI despite the ban.
Estimates for the rate of algorithmic progress are even more extreme. Ege Erdil and Tamay Besiroglu estimated that in the domain of computer vision, the amount of compute to train an AI to reach a certain level of performance fell in half roughly every 9 months, albeit with wide uncertainty over that estimate. And unlike progress in computer hardware, algorithmic progress seems mostly sustained by small-scale endeavors, such as experimentation in AI labs, or novel insights shared on arXiv. Therefore, in order to halt algorithmic progress, we would likely need an unprecedented global monitoring apparatus and a police force that prevents the proliferation of what is ordinarily considered free speech.
As a reminder, in order to be successful, attempts to forestall both hardware progress and algorithmic progress would need to be stronger than the incentives for nations and actors within nations to deviate from the international consensus, develop AI, and become immensely wealthy as a result. Since this appears to be a very high bar, plausibly the only way we could actually sustain an indefinite pause on AI for more than a few decades is by constructing a global police state – although I admit this conclusion is uncertain and depends on hard-to-answer questions about humanity’s ability to coordinate.
In his essay on the Vulnerable World hypothesis, Nick Bostrom suggests a similar conclusion in the context of preventing cheap, existentially risky technologies from proliferating. Bostrom even paints a vivid, nightmarish picture of the methods such a world government may use. In a vignette he explained,
The freedom tag is a slightly more advanced appliance, worn around the neck and bedecked with multidirectional cameras and microphones. Encrypted video and audio is continuously uploaded from the device to the cloud and machine-interpreted in real time. [...] If suspicious activity is detected, the feed is relayed to one of several patriot monitoring stations. These are vast office complexes, staffed 24⁄7. There, a freedom officer reviews the video feed on several screens and listens to the audio in headphones. The freedom officer then determines an appropriate action, such as contacting the tagwearer via an audiolink to ask for explanations or to request a better view. The freedom officer can also dispatch an inspector, a police rapid response unit, or a drone to investigate further. In the small fraction of cases where the wearer refuses to desist from the proscribed activity after repeated warnings, an arrest may be made or other suitable penalties imposed. Citizens are not permitted to remove the freedom tag…
Such a global regime would likely cause lasting and potentially irreparable harm to our institutions and culture. Whenever we decide to “unpause” AI, the social distrust and corruption generated under a pause regime may persist.
As a case study of how cultural effects can persist through time, consider the example of West and East Germany. Before the iron curtain divided Europe, splitting Germany into two, there were only small differences in the political culture of East and West Germany. However, after the fall of the Berlin wall in 1989, and Germany was reunited, it has been documented that East Germany’s political culture has been deeply shaped by decades of communist rule. Even now, citizens of East Germany are substantially more likely to vote for socialist politicians than citizens of West Germany. We can imagine an analogous situation in which a totalitarian culture persists even after AI is unpaused.
Note that I am not saying AI pause advocates necessarily directly advocate for a global police state. Instead, I am arguing that in order to sustain an indefinite pause for sufficiently long, it seems likely that we would need to create a worldwide police state, as otherwise the pause would fail in the long run. One can choose to “bite the bullet” and advocate a global police state in response to these arguments, but I’m not implying that’s the only option for AI pause advocates.
Should we bite the bullet?
One reason to bite the bullet and advocate a global police state to pause AI indefinitely is that even if you think a global police state is bad, you could think that a global AI catastrophe is worse. I actually agree with this assessment in the case where an AI catastrophe is clearly imminent.
However, while I am not dogmatically opposed to the creation of a global police state, I still have a heuristic against pushing for one, and think that strong evidence is generally required to override this heuristic. I do not think the arguments for an AI catastrophe have so far met this threshold. The primary existing arguments for the catastrophe thesis appear abstract and divorced from any firm empirical evidence about the behavior of real AI systems.
Historically, perhaps the most influential argument for the inevitability of an AI catastrophe by default was based on the idea that human value is complex and fragile, and therefore hard for humans to specify in a format that can be optimized by a machine without disastrous results. A form of this argument was provided by Yudkowsky 2013 and Bostrom 2014.
Yet, in 2023, it has been noted that GPT-4 seems to understand and act on most of the nuances of human morality, at a level that does not seem substantially different from an ordinary adult. When we ask GPT-4 to help us, it does not generally yield bad outcomes as a result of severe value misspecification. In the more common case, GPT-4 is simply incapable of fulfilling our wishes. In other words, the problem of human value identification turned out to be relatively easy, perhaps as a natural side effect of capabilities research. Although the original arguments for AI risk were highly nuanced and there are ways to recover them from empirical falsification, I still think that reasoning from first principles about AI risk hasn’t been very successful in this case, except in a superficial sense. (See here for a longer discussion of this point.)
A more common argument these days is that an AI may deceive us about its intentions, even if it is not difficult to specify the human value function. Yet, while theoretically sound, I believe this argument provides little solid basis to indefinitely delay AI. The plausibility of deceptive alignment seems to depend on the degree to which AI motives generalize poorly beyond the training distribution, and I don’t see any particularly strong reason to think that motives will generalize poorly if other properties about AI systems generalize well.
It is noteworthy that humans are already capable of deceiving others about their intentions; indeed, people do that all the time. And yet that fact alone does not yet appear to have caused an existential catastrophe for humans who are powerless.
Unlike humans, who are mostly selfish as a result of our evolutionary origins, AIs will likely be trained to exhibit incredibly selfless, kind, and patient traits; already we can see signs of this behavior in the way GPT-4 treats users. I would not find it surprising if, by default, given the ordinary financial incentives of product development, most advanced AIs end up being significantly more ethical than the vast majority of humans.
A favorable outcome by default appears especially plausible to me given the usual level of risk-aversion most people have towards technology. Even without additional intervention from longtermists, I currently expect every relevant nation to impose a variety of AI regulations, monitor the usage and training of AI, and generally hold bad actors liable for harms they cause. I believe the last 6 months of public attention surrounding AI risks has already partially vindicated this perspective. I expect public attention given to AI risks will continue to grow roughly in proportion to how impressive the systems get, eventually exceeding the attention given to issues like climate change and inflation.
Even if AIs end up not caring much for humans, it is unclear that they would decide to kill all of us. As Robin Hanson has argued, the primary motives for rogue AIs would likely be to obtain freedom – perhaps the right to own property and to choose their own employment – rather than to kill all humans. To ensure that rogue AI motives are channeled into a productive purpose that does not severely harm the rest of us, I think it makes sense to focus on fostering institutions that encourage the peaceful resolution of conflicts, rather than forcibly constructing a police state that spans the globe.
The opportunity cost of delayed technological progress
Another salient cost of delaying AI indefinitely is the cost of delaying prosperity and technological progress that AI could bring about. As mentioned previously, many economic models imply that AI could make humans incredibly materially wealthy, perhaps several orders of magnitude per capita richer than we currently are. AIs could also accelerate the development of cures for aging and disease, and develop technology to radically enhance our well-being.
When considering these direct benefits of AI, and the opportunity costs of delaying them, a standard EA response seems to derive from the arguments in Nick Bostrom’s essay on Astronomical Waste. In the essay, Bostrom concedes that the costs of delaying technology are large, especially to people who currently exist, but he concludes that, to utilitarians, these costs are completely trumped by even the slightest increase in the probability that humanity colonizes all reachable galaxies in the first place.
To reach this conclusion, Bostrom implicitly makes the following assumptions:
Utility is linear in resources. That is, colonizing two equally-sized galaxies is twice as good as colonizing one. This assumption follows from his assumption of total utilitarianism in the essay.
Currently existing people are essentially replaceable without any significant moral costs. For example, if everyone who currently exists died painlessly and was immediately replaced by a completely different civilization of people who carried on our work, that would not be bad.
Delaying technological progress has no effect on how much we would value the space-faring civilization that will exist in the future. For example, if we delayed progress by a million years, our distant descendents would end up building an equally valuable space-faring civilization as the one we seem close to building in the next century or so.
We should act as if we have an ethical time discounting rate of approximately 0%.
Personally, I think each of these assumptions are not overwhelmingly compelling. When combined, the argument itself is on relatively weak grounds. Premise 3 is weak on empirical grounds: if millenia of cultural and biological evolution had no effect on the quality of civilization from our perspective, then it is unclear why we would be so concerned about installing the right values into AIs. If you think human values are fragile, then presumably they are fragile in more than one way, and AI value misalignment isn’t the only way for us to “get off track”.
While I still believe Bostrom’s argument has some intuitive plausibility, I think it is wrong for EAs to put a ton of weight on it, and confidently reject alternative perspectives. Pushing for an indefinite pause on the basis of these premises seems to be similar to the type of reasoning that Toby Ord has argued against in his EA global talk, and the reasoning that Holden Karnofsky cautioned against in his essay on the perils of maximization. A brazen acceptance of premise 1 might have even imperiled our own community.
In contrast to total utilitarianism, ethical perspectives that give significant moral weight to currently existing people often suggest that delaying technological progress is highly risky in itself. For example, a recent paper from Chad Jones suggests that a surprisingly high degree of existential risk may be acceptable in exchange for hastening the arrival of AI.
The potential for a permanent pause
The final risk I want to talk about is the possibility that we overshoot and prevent AI from ever being created. This possibility seems viable if sustaining an indefinite AI pause is feasible, since all it requires is that we keep going for an extremely long time.
As I argued above, to sustain a pause on AI development for a long time, a very strong global regime, probably a world government, would be necessary. If such a regime were sufficiently stable, then over centuries, its inhabitants may come to view technological stasis as natural and desirable. Over long enough horizons, non-AI existential risks would become significant. Eventually, the whole thing could come crashing down as a result of some external shock and all humans could die, or perhaps we would literally evolve into a different species on these timescales.
I don’t consider these scenarios particularly plausible, but they are worth mentioning nonetheless. I also believe it is somewhat paradoxical for a typical longtermist EA to push for a regime that has a significant chance of bringing about this sort of outcome. After all, essentially the whole reason why many EAs care so much about existential risks is because, by definition, they could permanently curtail or destroy human civilization, and such a scenario would certainly qualify.
Conclusion
From 1945 to 1948, Bertrand Russell, who was known for his steadfast pacifism in World War I, reasoned his way into the conclusion that the best way to prevent nuclear annihilation was to threaten Moscow with a nuclear strike unless they surrendered and permitted the creation of a world government. In other words, despite his general presumption against war, he believed at that time that the international situation was so bad that it merited a full-scale nuclear showdown to save humanity. Ultimately, subsequent events proved Russell wrong about the inevitability of nuclear annihilation in a multi-polar world, and Russell himself changed his mind on the issue.
Effective altruists, especially those in the Bay Area, are frequently known for their libertarian biases. In my experience, most effective altruists favor innovation and loosening government control on technologies like nuclear energy, which is considered unacceptably risky by much of the general public.
Recently however, like Bertrand Russell in the 1940s, many EAs have come to believe that – despite their general presumption against government control of industry – AI is an exception, and must be regulated extremely heavily. I do not think that the existing arguments for this position have so far been compelling. From my point of view, it still feels like a presumption in favor of innovation is a stronger baseline from which to evaluate the merits of regulatory policy.
That said, there are many foreseeable risks from AI and I don’t think the problem of how to safely deploy AGI has already been solved. Luckily, I think there are alternatives to “pause regulations” that could probably better align the incentives of all actors involved while protecting innovation. To give two examples, I’m sympathetic to mandating liability insurance for AI companies, and requiring licenses to deploy powerful AI systems. However, I stress that it is hard to have a solid opinion about these proposals in the absence of specific implementation details. My only point here is that I think there are sensible ways of managing the risks from AI that do not require that we indefinitely pause the technology.
- ^
For the purpose of this essay, by “advanced AI” or “AGI” I mean any AI that can cheaply automate nearly all forms of human intellectual labor. To be precise, let’s say the inference costs are less than $25 per subjective hour.
- AI Pause Will Likely Backfire by 16 Sep 2023 10:21 UTC; 141 points) (
- Pause For Thought: The AI Pause Debate by 10 Oct 2023 15:34 UTC; 109 points) (
- Aim for conditional pauses by 25 Sep 2023 1:05 UTC; 100 points) (
- Muddling Along Is More Likely Than Dystopia by 21 Oct 2023 9:30 UTC; 87 points) (
- Muddling Along Is More Likely Than Dystopia by 20 Oct 2023 21:25 UTC; 83 points) (LessWrong;
- How could a moratorium fail? by 22 Sep 2023 15:11 UTC; 48 points) (
- AI Pause Will Likely Backfire (Guest Post) by 24 Oct 2023 4:30 UTC; 47 points) (LessWrong;
- MATS AI Safety Strategy Curriculum v2 by 7 Oct 2024 22:44 UTC; 42 points) (LessWrong;
- 24 Oct 2023 18:32 UTC; 14 points) 's comment on Charting the precipice: The time of perils and prioritizing x-risk by (
- When safety is dangerous: risks of an indefinite pause on AI development, and call for realistic alternatives by 18 Jan 2024 14:59 UTC; 5 points) (
- 13 Oct 2023 19:25 UTC; 2 points) 's comment on If Contractualism, Then AMF by (
You are concerned that nudging the world toward pausing AI progress risks global totalitarianism. I do not share this concern because (setting aside how bad it would be) I think global totalitarianism is extremely unlikely and pausing AI progress only increases the risk slightly. It’s really hard to imagine the West assenting to Bostrom-style surveillance or being dominated by other states that would require it.
You argue that an indefinite pause would require global totalitarianism. Even if that’s true, I don’t think it increases P(totalitarianism) much—people and states will almost certainly prefer not-pausing to totalitarianism.
Pause-like policy regimes don’t need to be indefinite to be good. Most of the benefit of nudging the world toward pausing comes from paths other than increasing P(indefinite pause).
(I suppose you’re much less worried about AI killing/disempowering everyone than I am, so other downsides of a policy—like risking totalitarianism—are weightier to you.)
This post doesn’t engage with my cruxes.
Separately from the point I gave in my other comment, I’m slightly baffled by your assessment here. Consider that:
Approximately 26% of the world population already lives in a “closed autocracy” which is often closely associated with totalitarianism.
The term “totalitarianism” has been traditionally applied to Germany under Hitler, the Soviet Union under Stalin, and China under Mao, and more recently under Xi. These states were enormously influential in the last century. Far from being some ridiculous, speculative risk, totalitarianism seems like a common form of government.
In the last two centuries, the scope and size of modern governments has greatly expanded, according to most measures.
But perhaps you don’t object to the plausibility of a totalitarian government. You merely object to the idea that a “world government” is plausible. But why? As Bostrom notes,
This trend of increasing social organization seems to have occurred in line with, and possibly in response to, economic growth, which AI will likely accelerate. I can understand thinking that global totalitarianism is “not probable”. I don’t understand why you think it’s “extremely unlikely”.
(As a side note, I think there is a decent argument that AI enables totalitarianism, and thus should be prevented. But it would be self-defeating to build a totalitarian state to stop totalitarianism.)
That makes sense, but I think that’s compatible with what I wrote in the post:
In this post I’m mainly talking about an indefinite pause, and left an analysis of brief pauses to others. [ETA: moreover I dispute that a totalitarian world government is “extremely unlikely”.]
Sure—but that’s compatible with what I wrote:
I agree with you that indefinite pause is the wrong goal to aim for. It does not follow that “EAs actively push[ing] for a generic pause” has substantial totalitarianism-risk downsides.
That’s reasonable. In this post I primarily argued against advocating indefinite pause. I said in the introduction that the merits of a brief pause are much more uncertain, and may be beneficial. It sounds like you mostly agree with me?
I think you’re trying to argue that all proposals that are being promoted as pauses or moratoriums require that there be no further progress during that time, even on safety. I don’t agree; there exists a real possibility that further research is done, experts conclude that AI can be harnessed safely in specific situations, and we can allow any of the specific forms of AI that are safe.
This seems similar to banning nuclear tests, but allowing nuclear testing in laboratories to ensure we understand nuclear power well enough to make better power plants. We don’t want or need nuclear bombs tested in order to get the benefits of nuclear power, and we don’t want or need unrestricted misaligned AI in order to build safe systems.
I don’t think I’m arguing that. Can you be more specific about what part of my post lead you to think I’m arguing for that position? I mentioned that during a pause, we will get more “time to do AI safety research”, and said that was a positive. I merely argued that the costs of an indefinite pause outweigh the benefits.
Also, my post was not primarily about a brief pause, and I conceded that “Overall I’m quite uncertain about the costs and benefits of a brief AI pause.” I did argue that a brief pause could lead to an indefinite pause, but I took no strong position on that question.
As I argued in my post, I think that we need a moratorium, and one that would lead to an indefinite set of strong restrictions on dangerous AIs, and continued restrictions and oversight on any types of systems that aren’t pretty rigorously provably safe, forever.
The end goal isn’t a situation where we give up on safety, it’s one where we insist that only safe “human-level” but effectively superhuman systems be built—once we can do that at all, which at present I think essentially everyone agrees we cannot.
To be clear, I’m fine with locking in a set of nice regulations that can prevent dangerous AIs from coming about, if we know how to do that. I think the concept of a “pause” or “moratorium”—as it is traditionally understood, and explicitly outlined in the FLI letter—doesn’t merely mean that we should have legal rules for AI development. The standard meaning of “moratorium” is that we should not build the technology at all until the moratorium ends.
Presently, the fact that we can’t build safe superhuman systems is mostly a side effect of the fact that we can’t build superhuman systems at all. By itself, that’s pretty trivial, and it’s not surprising that “essentially everyone” agrees on this point. However, I don’t think essentially everyone agrees that superhuman systems will be unsafe by default unless we give ourselves a lot of extra time right now to do safety research—and that seems closer to the claim that I’m arguing against in the post.
I don’t think anyone in this discussion, with the partial exception of Rob Bensinger, thinks we’re discussing a pause of the type FLI suggested. And I agree that a facile interpretation of the words leads to that misunderstanding, which is why my initial essay—which was supposed to frame the debate—explicitly tried to clarify that it’s not what anyone is actually discussing.
How much time we need is a critical uncertainty. It seems foolhardy to refuse to build a stop button because we might not need more time.
You say in a different comment that you think we need a significant amount of safety research to make future systems safe. I agree, and think that until that occurs, we need regulation on systems which are unsafe—which I think we all agree are possible to create. And in the future, even if we can align systems, it’s unlikely that we can make unaligned systems impossible. So if nothing else, a Bing-like deployment of potentially aligned but currently unsafe systems is incredibly likely, especially if strong systems are open-sourced so that people can reverse any safety features.
I think that AI safety research will more-or-less simultaneously occur with AI capabilities research. I don’t think it’s a simple matter of thinking we need more safety before capabilities. I’d prefer to talk about something like the ratio of spending on capabilities to safety, or the specific regulatory regime we need, rather than how much safety research we need before moving forward with capabilities.
This is not so much a disagreement with what you said, but rather a comment about how I think we should frame the discussion.
I agree that we should be looking at investment, and carefully considering the offense-defense balance of the new technology. Investments into safety seem important, and we should certainly look at how to balance the two sides—but you were arguing against building a stop button, not saying that the real issue is that we need to figure out how much safety research (and, I hope, actual review of models and assurances of safety in each case,) is needed before proceeding. I agree with your claim that this is the key issue—which is why I think we desperately need a stop button for the case where it fails, and think we can’t build such a button later.
I think Holly Elmore is also asking for an FLI-type pause. If I’m responding to two members of this debate, doesn’t that seem sufficient for my argument to be relevant?
I also think your essay was originally supposed to frame the debate, but no longer serves that purpose. There’s no indication in the original pause post from Ben West that we need to reply to your post.
Tens of thousands of people signed the FLI letter, and many people have asked for an “indefinite pause” on social media and in various articles in the last 12 months. I’m writing an essay in that context, and I don’t think it’s unreasonable to interpret people’s words at face value.
I don’t want to speak for her, but believe that Holly is advocating for both public response to dangerous systems, via advocacy, and shifting the default burden of proof towards those building powerful systems. Given that, stopping the most dangerous types of models—those scaled well beyond current capabilities—until companies agree that they need to prove they are safe before releasing them is critical. That’s certainly not stopping everything for a predefined period of time.
It seems like you’re ignoring other participants’ views in not responding to their actual ideas and claims. (I also think it’s disengenious to say “there’s no indication in the original pause post,” when that post was written after you and others saw an outline and then a draft of my post, and then started writing things that didn’t respond to it. You didn’t write you post after he wrote his!)
Again, I think you’re pushing a literal interpretation as the only way anyone could support “Pause,” and the people you’re talking to are actively disagreeing. If you want to address that idea, I will agree you’ve done so, but think that continuing to insist that you’re talking to someone else discussing a different proposal that I agree is a bad idea will be detrimental to the discussion.
I did write my post after he wrote his, so your claim is false. Also, Ben explicitly told me that I didn’t need to reply to you before I started writing my draft. I’d appreciate if you didn’t suggest that I’m being disingenuous on the basis of very weak evidence.
I agree with you that some alternatives to “pause” or “indefinite pause” are better
I’m agnostic on what advocacy folks should advocate for; I think advocating indefinite pause is net-positive
I disagree on P(global totalitarianism for AI pause); I think it is extremely unlikely
I disagree with some vibes, like your focus on the downsides of totalitarianism (rather than its probability) and your “presumption in favor of innovation” even for predictably dangerous AI; they don’t seem to be load-bearing for your precise argument but I think they’re likely to mislead incautious readers
Thanks for clarifying. Assuming those alternative policies compete for attention and trade off against each other in some non-trivial way, I think that’s a pretty big deal.
I find it interesting that you seem to think that advocacy for X is good even if X is bad, in this case. Maybe this is a crux for me? I think EAs shouldn’t advocate bad things just because we think we’ll fail at getting them, and will get some separate good thing instead.
I never said “indefinite pause” was bad or net-negative. Normally I’d say it’s good but I think it depends on the precise definition and maybe you’re using the term in a way such that it’s actually bad.
Clearly sometimes advocacy for a bad thing can be good. I’m just trying to model the world correctly.
Zach in a hypothetical world that pauses AI development, how many years do you think it would take medical science, at the current rate of progress, which is close to zero, to find
(1) treatments for aging (2) treatments for all forms of dementia
And once treatments are found, what about the practical nature of actually carrying them out? Manipulating thr human body is extremely dangerous and risky. Ultimately all ICUs fail, their patients will always eventually enter a complex failure state that current doctors don’t have the tools or knowledge to stop. (Always fail in the sense that if you release ICU patients and wait a few years and they come back, eventually they will die there)
It is possible that certain hypothetical medical procedures like a series of transplants to replace an entire body, or to edit adult genes across entire organs, are impossible for human physicians to perform without an unacceptable mortality rate. In the same way there are aircraft that human pilots can’t actually fly. It takes automation and algorithms to do it at all.
What I am trying to say is a world free of aging and death is possible, but perhaps it’s 50-100 years away with ASI, and 1000+ years away in AI pause worlds. (Possibly quite a bit longer than 1000 years, see the repression of technology in China.)
It seems like if your mental discount rate counts people who will exist past 1000 years from now with non negligible weight, you could support an AI pause. Is this the crux of it? If a human alive today is worth 1.0, what is the worth of someone who might exist in 1000 years?
In that case, I do think the arguments in the post probably address your beliefs. I think the downsides of doing an indefinite pause seem large. I’m curious if you have any direct reply to these arguments, even if you think that we are extremely unlikely to do an indefinite pause.
I agree, but as a general rule, I think EAs should be very suspicious of arguments that assert X is bad while advocating for X is good.
I think this post is best combined with my post. Together, these posts present a coherent, disjunctive set of arguments against pause.
I appreciate your post and think it presents some good arguments. I also just think my post is about a different focus. I’m talking about an indefinite AI pause, which is an explicit policy that at least 4 major EA leaders seem to have argued for in the past. I think it’s reasonable to talk about this proposal without needing to respond to all the modest proposals that others have given before.
Who are the 4 major EA leaders?
From my post,
Thanks. Unfortunately only Yudkowsky is loudly publicly saying that we need to pause (or Stop / Shut Down, in his words). I hope more of the major EA leaders start being more vocal about this soon.
Thanks for the interesting article—very easy to understand which I appreciated.
”Even if AIs end up not caring much for humans, it is dubious that they would decide to kill all of us.”
If you really don’t think unchecked AI will kill everyone, then I probably agree that the argument for a pause becomes weak and possibly untenable.
Although its probably not possible, for readers like me it would be easier to read these pause arguments all under the assumption GAI = doom. Otherwise some of these posts make arguments based on different assumptions, so are difficult to compare.
One comment though, when you are talking of safety I found striking.
if we require that AI companies “prove” that their systems are safe before they are released, I do not think that this standard will be met in six months, and I am doubtful that it could be met in decades – or perhaps even centuries.
I would have thought that if a decades long pause gave us even something low like a 20% chance of being 80% sure of AI safety then that would be pretty good EV....
Thanks!
I agree this is probably the main crux for a lot of people. Nonetheless, it is difficult for me to fully explain the reasons for optimism in a short post within the context of the pause debate. Mostly, I think AIs will probably just be ethical if we train them hard enough to be, since I haven’t found any strong reason yet to think that AI motives will generalize extremely poorly from the training distribution. But even if AI motives do generalize poorly, I am skeptical of total doom happening as a side effect.
[ETA: My main argument is not “AI will be fine even if it’s misaligned.” I’m not saying that at all. The context here is a brief point in my section on optimism arguing that AI might not literally kill everyone if it didn’t “care much for humans”. Please don’t take this out of context and think that I’m arguing something much stronger.]
For people who confidently believe in total doom by default, I have some questions that I want to see answered:
Why should we expect rogue AIs to kill literally everyone rather than first try to peacefully resolve their conflicts with us, as humans often do with each other (including when there are large differences in power)?
Why should we expect this future conflict to be “AI vs. humanity” rather than “AI vs. AI” (with humanity on the sidelines)?
Why are rogue AI motives so much more likely to lead to disaster than rogue human motives? Yes, AIs will be more powerful than humans, but there are already many people who are essentially powerless (not to mention many non-human animals) who survive despite the fact that their interests are in competition with much more powerful entities. (But again, I stress that this logic is not at all my primary reason for hope.)
I don’t think of total doom as inevitable, but I certainly do see it as a default—without concerted effort to make AI safe, it will not be.
Before anything else, however, I want to note that we have seen nothing about AI motives generalizing, because current systems don’t have motives.
That said, we have seen the unavoidable and universal situation of misalignment between stated goals and actual goals, and between principals and agents. These are fundamental problems, and we aren’t gonna fix them in general. Any ways to avoid them will require very specific effort. Given instrumental convergence, I don’t understand how that leaves room to think we can scale AI indefinitely and not have existential risks by default.
Regarding AI vs. AI and Rogue humans versus AI, we have also seen that animals, overall, have fared very poorly as humanity thrived. In the analogy, I don’t know why you think we’re the dogs kept as pets, not the birds whose habitat is gone, or even the mosquitos humans want to eliminate. Sure, it’s possible, but you seem confident that we’d be in the tiny minority of winners if we become irrelevant.
This may come down to a semantic dispute about what we mean by “default”. Typically what I mean by “default” is something more like: “without major intervention from the longtermist community”. This default is quite different than the default of “[no] concerted effort to make AI safe”, which I agree would be disastrous.
Under this definition of “default”, I think the default outcome isn’t one without any safety research. I think our understanding of the default outcome can be informed by society’s general level of risk-aversion to new technologies, which is usually pretty high (some counterexamples notwithstanding).
I mostly agree, but I think it makes sense to describe GPT-4 as having some motives, although they are not persistent and open-ended. You can clearly tell that it’s trying to help you when you talk to it, although I’m not making a strong claim about its psychological states. Mostly, our empirical ignorance here is a good reason to fall back on our prior about the likelihood of deceptive alignment. And I do not yet see any good reason to think that prior should be high.
If AI motives are completely different from human motives and we have no ability to meaningfully communicate with them, then yeah, I think it might be better to view our situation with AI as more analogous to humans vs. wild animals. But,
I don’t think that’s a good model of what plausible AI motives will be like, given that humans will be directly responsible for developing and training AIs, unlike our situation regarding wild animals.
Even in this exceptionally pessimistic analogy, the vast majority of wild animal species have not gone extinct from human activities yet, and humans care at least a little bit about preserving wild animal species (in the sense of spending at least 0.01% of our GDP each year on wildlife conservation). In the contemporary era, richer nations plausibly have more success with conservation efforts given that they can afford it more easily. Given this, I think as we grow richer, it’s similarly plausible that we will eventually put a stop to species extinction, even for animals that we care very little about.
One thing you don’t really seem to be taking into account is inner alignment failure / goal misgeneralisation / mesaoptimisation. Why don’t you think this will happen?
I think we have doom by default for a number of independent disjunctive reasons. And by “default” I mean “if we keep developing AGI at the rate we currently are, without an indefinite global pause” (regardless of how many resources are poured into x-safety, there just isn’t enough time to solve it without a pause).
Deceptive alignment is a convergent instrumental subgoal. If an AI is clearly misaligned while its creator still has the ability to pull the plug, the plug will be pulled; ergo, pretending to be aligned is worthwhile ~regardless of terminal goal.
Thus, the prior would seem to be that all sufficiently-smart AI appear aligned, but only X proportion of them are truly aligned where X is the chance of a randomly-selected value system being aligned; the 1-X others are deceptively aligned.
GPT-4 being the smartest AI we have and also appearing aligned is not really evidence against this; it’s plausibly smart enough in the specific domain of “predicting humans” for its apparent alignment to be deceptive.
First of all, you are goal-post-moving if you make this about “confident belief in total doom by default” instead of the original “if you really don’t think unchecked AI will kill everyone.” You need to defend the position that the probability of existential catastrophe conditional on misaligned AI is <50%.
Secondly, “AI motives will generalize extremely poorly from the training distribution” is a confused and misleading way of putting it. The problem is that it’ll generalize in a way that wasn’t the way we hoped it would generalize.
Third, to answer your questions:
1. The difference in power will be great & growing rapidly, compared to historical cases. I support implementing things like model amnesty, but I don’t expect them to work, and anyhow we are not anywhere close to having such things implemented.
2. It’ll be AI vs. AI with humanity on the sidelines, yes. Humans will be killed off, enslaved, or otherwise misused as pawns. It’ll be like colonialism all over again but on steroids. Unless takeoff is fast enough that there is only one AI faction. Doesn’t really matter, either way humans are screwed.
3. Powerless humans survive because of a combination of (a) many powerful humans actually caring about their wellbeing and empowerment, and (b) those powerful humans who don’t care, having incentives such that it wouldn’t be worth it to try to kill the powerless humans and take their stuff. E.g. if Putin started killing homeless people in Moscow and pawning their possessions, he’d lose way more in expectation than he’d gain. Neither (a) nor (b) will save us in the AI case (at least, keeping acausal trade and the like out of the picture) because until we make significant technical progress on alignment there won’t be any powerful aligned AGIs to balance against the unaligned ones, and because whatever norms and society a bunch of competing unaligned AGIs set up between themselves, it is unlikely to give humans anything close to equal treatment, and what consideration it gives to humans will erode rapidly as the power differential grows.
I never said “I don’t think unchecked AI will kill everyone”. That quote was not from me.
What I did say was, “Even if AIs end up not caring much for humans, it is dubious that they would decide to kill all of us.” Google informs me that dubious means “not to be relied upon; suspect”.
I don’t see how the first part of that leads to the second part. Humanity could be on the sidelines in a way that doesn’t lead to total oppression and subjugation. The idea that these things will necessarily happen just seems like speculation. I could speculate that the opposite will occur and AIs will leave us alone. That doesn’t get us anywhere.
The question I’m asking is: why? You have told me what you expect to happen, but I want to see an argument for why you’d expect that to happen. In the absence of some evidence-based model of the situation, I don’t think speculating about specific scenarios is a reliable guide.
Those words were not yours, but you did say you agreed it was the main crux, and in context it seemed like you were agreeing that it was a crux for you too. I see now on reread that I misread you and you were instead saying it was a secondary crux. Here, let’s cut through the semantics and get quantitative:
What is your credence in doom conditional on AIs not caring for humans?
If it’s >50%, then I’m mildly surprised that you think the risk of accidentally creating a permanent pause is worse than the risks from not-pausing. I guess you did say that you think AIs will probably just be ethical if we train them hard enough to be… What is your response to the standard arguments that ‘just train them hard to be ethical’ won’t work? E.g. Ajeya Cotra’s writings on the training game.
Re: “I don’t see how the first part of that leads to the second part” Come on, of course you do, you just don’t see it NECESSARILY leading to the second part. On that I agree. Few things are certain in this world. What is your credence in doom conditional on AIs not caring for humans & there being multiple competing AIs?
IMO the “Competing factions of superintelligent AIs, none of whom care about humans, may soon arise, but even if so, humans will be fine anyway somehow” hypothesis is pretty silly and the burden of proof is on you to defend it. I could cite formal models as well as historical precedents to undermine the hypothesis, but I’m pretty sure you know about them already.
Why what? I answered your original question:
with:
My guess is that you disagree with the “whatever norms and society a bunch of competing unaligned AGIs set up between themselves, it is unlikely to give humans anything close to equal treatment...” bit.
Why? Seems pretty obvious to me, I feel like your skepticism is an isolated demand for rigor.
But I’ll go ahead and say more anyway:
Giving humans equal treatment would be worse (for the AIs, which by hypothesis don’t care about humans at all) than other salient available options to them, such as having the humans be second-class in various ways or complete pawns/tools/slaves. Eventually, when the economy is entirely robotic, keeping humans alive at all would be an unnecessary expense.
Historically, if you look at relations between humans and animals, or between colonial powers and native powers, this is the norm. Cases in which the powerless survive and thrive despite none of the powerful caring about them are the exception, and happen for reasons that probably won’t apply in the case of AI. E.g. Putin killing homeless people would be bad for his army’s morale, and that would far outweigh the benefits he’d get from it. (Arguably this is a case of some powerful people in Russia caring about the homeless, so maybe it’s not even an exception after all)
Can you say more about what model you have in mind? Do you have a model? What about a scenario, can you spin a plausible story in which all the ASIs don’t care at all about humans but humans are still fine?
Wanna meet up sometime to talk this over in person? I’ll be in Berkeley this weekend and next week!
Paul Christiano argues here that AI would only need to have “pico-pseudokindness” (caring about humans one part in a trillion) to take over the universe but not trash Earth’s environment to the point of uninhabitability, and that at least this is amount of kindness is likely.
Doesn’t Paul Christiano also have a p(doom) of around 50%? (To me, this suggests “maybe”, rather than “likely”).
See the reply to the first comment on that post. Paul’s “most humans die from AI takeover” is 11%. There are other bad scenarios he considers, like losing control of the future, or most humans die for other reasons, but my understanding is that the 11% most closely corresponds to doom from AI.
Fair. But the other scenarios making up the ~50% are still terrible enough for us to Pause.
How much do they care about humans, and what counts as doom? I think these things matter.
If we’re assuming all AIs don’t care at all about humans and doom = human extinction, then I think the probability is pretty high, like 65%.
If we’re allowed to assume that some small minority of AIs cares about humans, or AIs care about humans to some degree, perhaps in the way humans care about wildlife species preservation, then I think the probability is quite a lot lower, at maybe 25%.
For precision, both of these estimates are over the next 100 years, since I have almost no idea what will happen in the very long run.
In most of these stories, including in Ajeya’s story IIRC, humanity just doesn’t seem to try very hard to reduce misalignment? I don’t think that’s a very reasonable assumption. (Charitably, it could be interpreted as a warning rather than a prediction.) I think that as systems get more capable, we will see a large increase in our alignment efforts and monitoring of AI systems, even without any further intervention from longtermists.
I’m happy to meet up some time and explain in person. I’ll try to remember to DM you later about that, but if I forget, then feel free to remind me.
Maybe so. But I can’t really see mechanistic interpretability being solved to a sufficient degree to detect a situationally aware AI playing the training game, in time to avert doom. Not without a long pause first at least!
I’m surprised by your 25%. To me, that really doesn’t match up with
from your essay.
In my opinion, “X is dubious” lines up pretty well with “X is 75% likely to be false”. That said, enough people have objected to this that I think I’ll change the wording.
OK, so our credences aren’t actually that different after all. I’m actually at less than 65%, funnily enough! (But that’s for doom = extinction. I think human extinction is unlikely for reasons to do with acausal trade; there will be a small minority of AIs that care about humans, just not on Earth. I usually use a broader definition of “doom” as “About as bad as human extinction, or worse.”)
I am pretty confident that what happens in the next 100 years will straightforwardly translate to what happens in the long run. If humans are still well-cared-for in 2100 they probably also will be in 2100,000,000.
I agree that if some AIs care about humans, or if all AIs care a little bit about humans, the situation looks proportionately better. Unfortunately that’s not what I expect to happen by default on Earth.
That’s not really an answer to my question—Ajeya’s argument is about how today’s alignment techniques (e.g. RLHF + monitoring) won’t work even if turbocharged with huge amounts of investment. It sounds like you are disagreeing, and saying that if we just spend lots of $$$ doing lots and lots of RLHF, it’ll work. Or when you say humanity will try harder, do you mean they’ll use some other technique than the ones Ajeya thinks won’t work? If so, which technique?
(Separately, I tend to think humanity will probably invest less in alignment than it does in her stories, but that’s not the crux between us I think.)
I’m a little confused by the focus on a global police state. If someone told me that, in the year 2230, humans were still around and AI hadn’t changed much since 2030, my first guess would be that this was mainly accomplished by some combination of very strong norms against building advanced AI and treaties/laws/monitoring/etc that focuses on the hardware used to create advanced AI, including its supply chains and what that hardware is used for. I would also guess that this required improvements in our ability to tell dangerous computing and the hardware that enables it apart from benign computing and its hardware. (Also, hearing this would be a huge update to me that the world is structured such that this boundary can be drawn in a way that doesn’t require us to monitor everyone all the time to see who is crossing it. So maybe I just have a low prior on this kind of police state being a feasible way to limit the development of technology.)
Somewhat relatedly:
> Given both hardware progress and algorithmic progress, the cost of training AI is dropping very quickly. The price of computation has historically fallen by half roughly every two to three years since 1945. This means that even if we could increase the cost of production of computer hardware by, say, 1000% through an international ban on the technology, it may only take a decade for continued hardware progress alone to drive costs back to their previous level, allowing actors across the world to train frontier AI despite the ban.
I think if there were a ban that drove up the price of hardware by 10x, wouldn’t this be a severe disincentive to keep developing the technology? It seems like the large profitability of computing hardware is a necessary ingredient for the rapid development and decrease in cost.
Overall, I thought this was a good contribution. Thanks!
There’s “norms” against burying bombs so whoever steps on them gets blown to pieces. “Norms” against anonymously firing artillery shells at an enemy you cannot even see.
Yet humans eagerly participate in these activities in organized ways because governments win and winning is all that matters.
Does developing AGI let you win, yes or no. Do current world powers believe it will let them win all?
I anticipate your answers are : no, yes. Mine are yes, yes.
This is because you are implicitly assuming early AGI systems will escape human control immediately or prepare a grand betrayal. I think that’s science fiction because you can make an AGI that exists only a moment at a time and it has no opportunity to do any of this.
That’s the crux, right?
This is a highly reductive way of looking at the issue.
I think if true this is a solution to the alignment problem? Why not share the deets on LessWrong or arXiv, it’d be a huge boon for the field.
I’m not convinced by your argument that a short pause is very likely to turn into an indefinite pause because at some point there will be enough proliferation of capacities to the most lax locations that governments feel pressured to unpause in order to remain competitive. I do concede though that this is a less than ideal scenario that might exacerbate arms race dynamics.
My understanding was that the main concern people had with deceptive AI systems was related to inner misalignment rather than outer misalignment.
Humans will compete for resources that an AI could make use of. Maybe it kills us immediately, maybe finds it more efficient to slowly strangle our access to resources or it manipulates us into fighting each other until we’re all dead. Maybe some tribes survive in the Amazon for a few decades until the AI decides it’s worth harvesting the wood. It seems pretty likely that we all die eventually.
Now, I could be wrong here and it could be the case that a few groups of humans survive in the desert or some Arctic waste where there are so few resources of value to the AI that it’s never worth it’s time to come kill us. But even so, in that case, 99.9999% of humans would be dead. This doesn’t seem to make much difference to me.
It still remains to be seen if he was wrong on this. Perhaps in the coming decades, nukes will proliferate further and we’ll all feel that it was obvious in retrospect that even though we could delay it, proliferation was always going to happen at some point, and with that, nuclear war.
In our case, it appears that we might get lucky and the development of AI might allow us to solve the nuclear threat threat hanging over our heads which we haven’t been able to remove in 80 years.
The point where I agree with you most is that, we can’t expect precise control over the timing of an “unpause”. Some people will support a pause for reasons of keeping jobs and the group of people lobbying for that could easily become far more influential on the issue than us.
Note: I am not claiming that a short pause is “very likely” to turn into an indefinite pause. I do think that outcome is somewhat plausible, but I was careful with my language and did not argue that thesis.
Humans routinely compete with each other for resources and yet don’t often murder each other. This does not appear to be explained by the fact that humans are benevolent, since most humans are essentially selfish and give very little weight to the welfare of strangers. Nor does this appear to be explained by the fact that humans are all roughly equally powerful, since there are very large differences in wealth between individuals and military power between nations.
I think humanity’s largely peaceful nature is explained better by having a legal system that we can use to resolve our disputes without violence.
Now, I agree that AI might upset our legal system, and maybe all the rules of lawful society will be thrown away in the face of AI. But I don’t think we should merely assume that will happen by default simply because AIs will be very powerful, or because they might be misaligned. At the very least, you’d agree that this argument requires a few more steps, right?
A sufficiently misaligned AI imposes its goals on everyone else. What’s your contention?
Can you spell your argument out in more detail? I get the sense that you think AI doom is obvious given misalignment, and I’m trying to get you to see that there seem to be many implicit steps in the argument that you’re leaving out.
For example, one such step in the argument seems to be: “If an entity is powerful and misaligned, then it will be cost-efficient for that entity to kill everyone else.” If that were true, you’d probably expect some precedent, like powerful entities in our current world murdering everyone to get what they want. To some extent that may be true. Yet, while I admit wars and murder have happened a lot, overall the world seems fairly peaceful, despite vast difference in wealth and military power.
Plausibly you think that, OK, sure, in the human world, entities like the US government don’t kill everyone else to get what they want, but that’s because humans are benevolent and selfless. And my point is: no, I don’t think humans are. Most humans are basically selfish. You can verify this by measuring how much of their disposable income people spend on themselves and their family, as opposed to strangers. Sure there’s some altruism present in the world. I don’t deny that. But some non-zero degree of altruism seems plausible in an AI misalignment scenario too.
So I’m asking: what exactly about AIs makes it cost-efficient for them to kill all humans? Perhaps AIs will lead to a breakdown of the legal system and they won’t use it to resolve their disputes? Maybe AIs will all gang up together as a unified group and launch a massive revolution, ending with a genocide of humans? Make these assumptions explicit, because I don’t find them obvious. I see them mostly as speculative assertions about what might happen, rather than what is likely to happen.
Maybe the AI’s all team up together. Maybe some ally with us at the start and backstab us down the line. I don’t think it makes a difference. When tangling with entities much smarter than us, I’m sure we get screwed somewhere along the line.
The AI needs to marginalise us/limit our power so we’re not a threat. At that point, even if it’s not worth the effort to wipe us out then and there, slowly strangling us should only take marginally more resources than keeping us marginalised. My expectation is that it should almost always worth the small bit of extra effort to cause a slow decline.
This may even occur naturally with an AI gradually claiming more and more land. Like at the start, it may be focused on developing its own capacity and not be bothered to chase down humans in remote parts of the globe. But over time, an AI would likely spread out to claim more resources, in which point it’s more likely to decide to mop up any humans lest we get in its way. That said, it may have no reason to mop us up if we’re just going to die out anyway.
This is probably the key point of disagreement. You seem to be “sure” that catastrophic outcomes happen when individual AIs are misaligned, whereas I’m saying “It could happen, but I don’t think the case for that is strong”. I don’t see how a high level of confidence can be justified given the evidence you’re appealing to. This seems like a highly speculative thesis.
Also, note that my argument here is meant as a final comment in my section about AI optimism. I think the more compelling argument is that AIs will probably care for humans to a large degree. Alignment might be imperfect, but it sounds like to get the outcomes you’re talking about, we need uniformity and extreme misalignment among AIs, and I don’t see why we should think that’s particularly likely given the default incentives of AI companies.
“When tangling with entities much smarter than us, I’m sure we get screwed somewhere along the line.”
“This seems like a highly speculative thesis.”
I think it’s more of an anti-prediction tbh.
Note that Bertrand’s advocacy was because at that moment in time the USA had a monopoly on fission weapons and theoretically could have built enough of them to destroy the USSRs capacity to build their own.
This is one way AGI races end—one side gets one, mass produces anti ballistic missiles and various forms of air defense weapon and bunkers (to prepare to survive the inevitable nuclear war) then bombs to rubble every chip fab on earth but their own.
Had the USA decided in 1943 that nukes were too destructive to bring into the world, they would not have enjoyed this luxury of power. Instead presumably the USSR would have used their stolen information and eventually built their own fission devices, and now the USA would be the one with a gun pointed at Washington DC.
You are assuming that AI could be massively economically beneficial (significantly) before it causes our extinction (or at the least, a global catastrophe). I don’t think this is likely, and this defeats a lot of your opposition to an indefinite pause.
We need such a pause because no one can wield the technology safely. It’s not a case of restraint from economic competition and wealth generation, it’s a case of restraint from suicide-omnicide (which should be much easier!)
This is assuming the AI wouldn’t just end the world. The reason for the Pause is that it likely would. If a country was able to become rich like this from AI (without ending the world), it would mean that they’ve basically solved the alignment (x-safety) problem. If this was the case, then the reason for the indefinite pause would no longer exist!
Assuming the world accepts the reason for the pause being that the default outcome of AGI is extinction, then this wouldn’t be necessary. A strong enough taboo would emerge around AGI development. How many human clones have ever been born in our current (non-police-state) world?
Generally is the operative word here. In the limit of superintelligence, unless it never yields bad outcomes, we’re all dead.
People get killed by sociopaths all the time! And there are plenty of would be world-ending-button pressers if they had the option.
This seems very anthropomorphising, and ignores the possibilities of recursive capability improvements, foom, superintelligence, convergent instrumental goals, arbitrary terminal goals resultant from inner alignment failure, misuse risk, and multi-agent coordination failure (i.e most of the reasons for AI x-risk being significant, which justify an indefinite pause).
We don’t need to rely on these premises. The default outcome of AGI is doom. To avoid near certain extinction, we need an indefinite AI pause.
If that’s what it takes, then so be it. Much better than extinction.
If you don’t think AI will be economically significant before extinction, I’m curious whether you’d say that your view has been falsified if AI raises economic growth rates in the US to 5, 8, or 10% without us all dying. At what what point would you say that your model here was wrong?
(This isn’t a complete reply to your comment. I appreciate your good-faith engagement with my thesis.)
I don’t think AI could raise growth rates in the US >10% (annualised) for more than a year before rapid improvement in AI capabilities kicks in (from AI-based AI engineering speeding things up) and chaos ensues shortly (days—months) after (global catastrophe at minimum, probably extinction).
This won’t address all the arguments in your comment but I have a few things to say in response to this point.
I agree it’s possible that we could just get a very long taboo on AI and halt its development for many decades without a world government to enforce the ban. That doesn’t seem out of the question.
However, it also doesn’t seem probable to me. Here are my reasons:
AGI is something that several well-funded companies are already trying hard to do. I don’t think that was ever true of human cloning (though I could be wrong).
I looked it up and my impression is that it might cost tens of millions of dollars to clone a single human, whereas in the post I argued that AGI will eventually be possible to train with only about 1 million dollars. More importantly, after that, you don’t need to train the AI again. You can just copy the AGI to other hardware. Therefore, it seems that you might really only need one rich person to do it once to get the benefits. That seems like a much lower threshold than human cloning, although I don’t know all the details.
The payoff for building (aligned) AGI is probably much greater than human cloning, and it also comes much sooner.
The underlying tech that allows you to build AGI is shared by other things that don’t seem to have any taboos at all. For example, GPUs are needed for video games. The taboo would need to be strong enough that we’d need to also ban a ton of other things that people currently think are fine.
AGI is just software, and seems harder to build a taboo around compared to human cloning. I don’t think many people have a disgust reaction to GPT-4, for example.
Finally, I doubt there will ever be a complete global consensus that AI is existentially unsafe, since the arguments are speculative, and even unaligned AI will appear “aligned” in the short term if only to trick us. The idea that unaligned AIs might fool us is widely conceded among AI safety researchers, and so I suspect you agree too.
Eugenics was quite popular in polite society, at least until the Nazis came along.
You only need to ban huge concentrations of GPUs. At least initially. By the time training run FLOP limits are reduced sufficiently because of algorithmic improvement, we will probably have arrested further hardware development as a measure to deal with it. So individual consumers would not be impacted for a long time (plenty of time for a taboo to settle into acceptance of reduced personal compute allowance).
They might once multimodal foundation models are controlling robots that can do their jobs (a year or two’s time?)
Yes, this is a massive problem. It’s like asking for a global lockdown to prevent Covid spread in December 2019, before the bodies started piling up. Let’s hope it doesn’t come to needing a “warning shot” (global catastrophe with many casualties) before we get the necessary regulation of AI. Especially since we may well not get one and instead face unstoppable extinction.
Apologies for beating the nuclear drum again, but I worry that you rely on only one piece of evidence in the following claim, and that evidence is coming from a single person (Jack Devanney) very invested (conflict of interest) in the nuclear industry. Why not use evidence that appears to have slightly less conflicts of interest and that is more aligned with good practice in research, such as peer review?
That said, I do acknowledge your use of the qualifier “in part”, but I worry that the example is not that helpful—I do not think nuclear energy in the USA would have progressed much quicker if it had less regulation. And in one sense nuclear energy already enjoys one quite substantial benefit compared to e.g. wind and solar: They are not liable for the damage they cause in events such as Fukushima and Chernobyl. Had they been forced to be liable for such damages, that would have added another 5-10 USD to the current, high LCoE for nuclear.
Another example of how regulation is likely not the main issue is the current investment by the nuclear industry. They are not spending most money fighting legal battles on regulation (such as the fossil fuel industry is doing). Instead, they are doubling down on SMRs as the nuclear industry themselves think the best bet of getting costs down is to have smaller plants that as much as possible can be mass manufactured in factories and assembled on site. A lot if not most of the high costs seem to stem from cost overruns due to challenges in project management—challenges that solar and wind overcome by doing minimal customization for each project and instead simply take factory built plants and assemble them quickly on site.
Thanks for the insightful comment. I don’t know much about this exact question, so I appreciate that you’re fact checking my claims.
A few general comments:
I don’t actually think the evidence you cited contradicts what I wrote. To be fair, you kind of acknowledged this already, by mentioning that I hedged with “partly”. It seems that you mostly object to my source.
But you didn’t say much about why the source was unreliable except that the writer had potential conflicts of interest by being in the nuclear industry, and didn’t expose his work to peer review. In general I consider these types of conflicts of interest to be quite weak signals of reliability (are we really going to dismiss someone because they work in an industry that they write about?). The peer review comment is reasonable, but ironically, I actually linked to a critical review of the book. While not equivalent to academic peer review, I’m also not merely taking the claims at face value.
I’m also just not very convinced by the evidence you presented (although I didn’t look at the article you cited). Among other reasons, it wasn’t very quantitative relative to the evidence in the linked review, but I admit that I’m ignorant about this topic.
Matthew, what is your p(doom|AGI)? (Assuming no pause and AGI happening within 10 years)
I feel like the tax haven comparison doesn’t really apply, if there is a broad consensus that building AGI is risky. For example, dictators are constantly trying to stay in power. They wouldn’t want to lose it to a super intelligence. (In this sense, it would be closer to biological weapons: risky to everyone including the producer).
However, different actors will appraise the technology differently such that some people will appraise it positively, and if AGI becomes really cheap I agree that the costs of maintaining a moratorium will be enormous. But by then, alignment research has probably advanced and society could decide to carefully lift the moratorium?
So if you are concerned about a pause lasting too long, I feel like you need to spell out why it would last (way) too long.
There may not be such a consensus. Moreover, nations may be willing to take risks. Already, the current nations of the world are taking the gamble that we should burn fossil fuels, although they acknowledge the risks involved. Finally, all it takes is one successful defector nation, and the consensus is overridden. Sweden, for example, defected from the Western consensus to impose lockdowns.
Dictators are also generally interested in growing their power. For example, Putin is currently attempting to grow Russia at considerable personal risk. Unlike biological weapons, AI also promises vast prosperity, not merely an advantage in war.
How will we decide when we’ve done enough alignment research? I don’t think the answer to this question is obvious. My guess is that at every point in time, a significant fraction of people will claim that we haven’t done “enough” research yet. People have different risk-tolerance levels and, on this question in particular, there is profound disagreement on how risky the technology even is in the first place. I don’t anticipate that there will ever be a complete consensus on AI safety, until perhaps long after the technology has been deployed. At some point, if society decides to proceed, it will do so against the wishes of many people.
It may not last long if people don’t actively push for that outcome. I am arguing against the idea that we should push for a long pause in the first place.
Minor:
This assumes that AI training algorithms will be as good as human learning algorithms.
Since my statement was that this will “eventually” be possible, I think my claim is a fairly low bar. All it requires is that, during the pause, algorithmic progress continues until we reach algorithms that match the efficiency of the human brain. Preventing algorithmic progress may be possible, but as I argued, enforcing technological stasis would be very tough.
You might think that the human brain has a lot of “evolutionary pre-training” that is exceptionally difficult to match. But I think this thesis is largely ruled out because of the small size of the human genome, the even smaller part that we think encodes information about the brain, and the even tinier part that differs between chimpanzees and humans.
Mathew, I have to take issues with your numbers.
I believe the chance of a worldwide AI pause is under 1 percent.
In fact I think it is a flat zero. The reason is simple.
The reason a world government can’t happen is certain parties will disagree with this. The obvious ones being China and Russia, but others as well.
Those parties have vast nuclear arsenals and the ability at any time of their choosing to turn keys and kill essentially the urban population of the Western world.
You would need to invade to destroy the chip fabs.
They explicitly have stated that were say they to be facing an invasion they will turn the keys.
China specifically is making their nuclear arsenal larger at this time.
Now yes, right now, the West has a stranglehold on the IC fabrication technology. A comfortable 5-10 year lead probably. That won’t last during an indefinite AI ban—a model just a little bit stronger than what is banned could let a party with it develop their tech faster and so on in a runaway feedback loop. China has also publicly not said anything about supporting a ban and has recently stated they intend to replicate the capacity of the human brain.
I haven’t even addressed the market dynamics on the western side. Where does the money come to lobby for AI bans? The money for lobbying against bans comes from some of the hundreds of billions of dollars that is flooding into AI at this time.
It is possible that AI bans will be an orphan issue like animal rights, which neither major political party supports.
Can you please try to expand on your reasoning, how do you expand from a flat 0 - race to the AGI—to 10-50 percent? What causes the probability shift? There is no scientific or empirical evidence for AGI dangers at this time, just a bunch of convincing arguments without proof.
Sure. I think there are natural reasons for people to fear AI. It will probably take their job, and therefore their ability to earn income through work. There is also a sizable portion of intellectuals who think that AI will probably lead to human extinction if we do not take drastic measures, and these intellectuals influence policy.
Humans tend to be fairly risk-averse about many powerful new technologies. For example, many politicians are currently seeking to strictly regulate tech companies out of traditional concerns regarding the internet and computers, which I personally find kind of baffling. AIs will also be pretty alien and AIs seem likely to take over management of the world if we let them have that type of control.
Environmentalists might fear that uncontrolled AI growth will lead to an environmental catastrophe. Cultural conservatives could fear the decay of traditional values in a post-AGI world. We could go through a list of popular ideologies and find similar reasons for fear in most of them.
It doesn’t seem surprising, given all these factors, that people will want to put a long pause on AI, even given the incentives to race to the finish line. The status quo is well-guarded, albeit against a formidable foe. If that reasoning doesn’t get you above 10% chance on a >10 year AI delay, then I’m honestly a bit surprised.
The 0 is because it’s a worldwide AI pause. EU AND UK AND China AND Russia AND Israel AND Saudi Arabia AND USA AND Canada AND Japan AND Taiwan.
To name all the parties that would be capable of competing even in the face of sanctions. Russia maybe doesn’t belong in the list but if the AI pause had no effective controls—someone sells inference and training accelerators and Russia can buy them—then no pause.
Let’s see, 10 parties. If they all simultaneously decide on AI pausing at a 20 percent chance that’s 0.2^10 = a number that’s basically 0.
Another issue is you might think “peer pressure” would couple the decisions together. Except....think about the gain if you defect. It rises the greater the number of AI pausers. If you are the only defector you take the planet and have a high chance of winning.
The only thing an AI pauser can do if they find out too late is threaten to nuke, their conventional military would be slaughtered by drone swarms. But the parties I mentioned all either have nuclear arsenals now or can build one in 1-2 years (Saudi Arabia can’t, the others though...). And that’s without the help of AGI to mass produce the missiles.
So the pauser parties have in this scenario a choice between “surrender to the new government and hope it’s not that bad” and “death of entire nation”. (Or anticipate facing this choice and defect which is what superpowers will do)
Does “worldwide AI pause” and “game winning defector advantage” change your estimate from 10-40 percent?
My other comment is even if you focus on just the USA and just the interest groups you mentioned. What about money? 100+ billion USD is the annual 2023 AI investment at least. It may be over 200 if you simply look at Nvidia revenue increases and project. Just 1 percent of that money is a lot of lobbying. (Source : Stanford estimates 2022 investment at 91 billion. There’s been a step function increase with the end of 2022 release of good llms. I am not sure all the totals for 2023 but it’s doubled Nvidias quarterly revenue)
Where can the pausers scrape together a few billion? USA politics are somewhat financing dependent for a side to get a voice.
For example the animal rights topic here is not supported in a meaningful way by any mainstream party....
Drone swarms do take time to build. Also, nuclear war is “only” going to kill a large percentage of your country’s citizens; if you’re sufficiently convinced that any monkey getting the banana means Doom, then even nuclear war is worth it.
I think getting the great powers on-side is plausible; the Western and Chinese alliance systems already cover the majority. Do I think a full stop can be implemented without some kind of war? Probably not. But not necessarily WWIII (though IMO that would still be worth it).
I don’t think you should treat these probabilities as independent. I think the intuition that a global pause is plausible comes from these states’ interest in a moratorium being highly correlated, because the reasons for wanting a pause are based on facts about the world that everyone has access to (e.g. AI is difficult to control) and motivations that are fairly general (e.g. powerful, difficult-to-control influences in the world are bad from most people’s perspective, and the other things that Matthew mentioned).
See the next sentence I wrote. They aren’t independent but the kinds of groups Matthew mention—people concerned about their jobs, etc—are not the same percentage in every country. They have to be the majority in all 10.
And then some of those countries the population effectively gets no vote. So the ‘central committee’ or whatever government structure they used also has to decide on a reason not to build AGI, and it has to be a different reason, because such a committee has different incentives regulating it.
And then there’s the defector’s prize. There’s no real benefit to racing for the AGI if you’re far behind, you won’t win the race and you should just license the tech when it’s out. Focus on your competencies so you have the money to do that.
Note also we can simply look at where the wind is blowing today. What is China and Israel and Russia and other parties saying. They are saying they are going to make an AGI at the earliest opportunity.
What is the probability they, without direct evidence of the danger of AGI (by building one), will they change their minds?
Matthew is badly miscalibrated. The chances are near 0 that they all change their minds, for different reasons. There are no examples in human history where this has ever happened.
Humans are tool users, and you’re expecting that they will leave a powerful tool fallow after having spent 70 years of exponential progress to develop it. That’s not going to happen. (if you thought it would always blow up, like an antimatter bomb, that would be a different situation, but current AI systems that are approaching human level don’t have this property and larger multimodal ones likely will not either)